1. Technical Field
The invention relates generally to the management of content, and more particularly, to an improved solution for managing key terms in the content.
2. Background Art
In information management, it is frequently desirable to extract key terms from content, such as a document. Term extraction benefits both the content authoring (e.g., generating a glossary, index, checking term consistency, identifying inappropriate terms, etc.) as well as content translation (e.g., advanced translation of key terms). To date, many key term extraction tools identify a substantial number of insignificant extra words, duplicate terms, and/or strings. Further, these tools frequently misidentify terms. The inclusion of these superfluous lexical units as output adds a substantial burden to the reviewer who will need to eliminate them, thereby reducing the usability of the output and adding to the cost and time required to complete the key term extraction.
In response, an improved key term extraction tool was created by International Business Machines Corp. of Armonk, N.Y. (IBM). As described in the paper entitled “Terminology Extraction for Global Content Management”, Terminology, September 2003, vol. 9, no. 1, pp. 51-69, which is hereby incorporated herein by reference, the tool scans a file and effectively extracts nouns/noun phrases along with other information, which is then included in a list. However, while the output is more effective than previous tools, a substantial number of extra lexical units continue to be output by the tool.
As a result, a need exists for an improved key term extraction tool and process that further reduces the inclusion of lexical units in the output that are not needed for the intended content authoring and/or translation purposes.